Algorithm and Hardware for a Merge Sort Using Multiple Processors

نویسنده

  • Stephen Todd
چکیده

An algorithm is described that allows log ( n ) processors to sort n records in just over 2n write cycles, together with suitable hardware to support the algorithm. The algorithm is a parallel version of the straight merge sort. The passes of the merge sort are run overlapped, with each pass supported by a separate processor. The intermediate files of a serial merge sort are replaced by first-in firstout aueues. The processors and Queues may be implemented in conventional solid logic technology or in bubble technology. A hybrid technology is alsb appropriate. ~ Introduction Most conventional sorting algorithms operate on a single processor and require of order n . log (n) cycles to sort n records. Examples are the merge sort [ l , pp. 163-1651 and quicksort [ l , pp. 114-1161. There are single processor algorithms with sort times proportional to n, but these are only effective in certain circumstances. Address sorting [ 1, pp. 99-1021 requires the spread of sort key values to be known and fairly random. Digital sorting [ l , p. 1701 is very good for main storage sorts of files with short keys. When secondary storage is used, the digit length has to be small to reduce the number of open files; the key then consists of many digits and the constant of proportionality of the sort is high. A variety of multiple processor sorts exists, most of which require a very large number of processors, proportional to n or more. These are the network sorts [ 1, pp. 220-2431, in particular Batcher’s merge exchange sort [ 1, pp. 111-1141, Thompson and Kung’s mesh sorts [2], and Chen’s parallel bubble sort [3]. Some of these sorts are very fast, but all require very special hardware and are impracticable for large files with current technology. Even proposed a sort using r (log, n) processors and 4 . [(log, n) tape units to sort in 3.2r(log, n) write cycles [4]. This sort is made very complicated by the necessity of rewinding tapes before they can be read. We present a sort that is similar to Even’s. It uses more sophisticated hardware, which makes it both faster and simpler. The basic algorithm permits [(log, n) + I processors to sort n records in 2n + log, n 1 write cycles. This requires the storage of 2 log, n intermediate queues of variable length and maximum total length n records. These can be implemented using conventional main storage or shift register (e.g., bubble) storage. Our queues differ from Even’s tapes in that they can be read before they have been fully written, and no rewind is needed. There are variations on our basic algorithm requiring fewer resources. The proposed sort is suitable for use when several processors are available, but not order n or more. Very simple processors, which are only required to do a merge, can be used. Our sort is faster for sorting general files than single processor sorts, but not as fast as the network sorts. Our sort could be used in a low cost special purpose sorting machine. Sorting is traditionally used in batch processing and also now in efficient implementations of relational query systems (e.g., [5]). Our sort would form a natural part of a relational data base machine [6]. The algorithm is a variant of a straight merge sort [ 1 , pp. 163-1651. The passes are run overlapped rather than serially. Each pass is supported by a separate processor. Reading from the output of one pass begins before the writing of that output is complete, so the intermediate structures are first-in first-out queues rather than files. When the number of records to be sorted is not an exact power of 2, the normal serial algorithm deals with the remainder at the end of each pass; our algorithm deals with it first. There are several variations of the algorithm that are more suitable in certain circumstances. A multi-way merge sort reduces the number of processors. Small sections of data can be sorted before being introduced to the Copyright 1978 by International Business Machines Corporation. Copying is permitted without payment of royalty provided that (1) each reproduction is done without alteration and (2) the Journal reference and IBM copyright notice are included on the first page. The title and abstract may be used without further permission in computer-based and other information-service systems. Permission to republish other excerpts should be obtained from the Editor. IBM J. RES. DEVELOP. 0 VOL. 22 NO. 5 SEPTEMBER 1978 ql (43 95 e-h-c-ah g-h a f c h a h g f e d c h a * Output, 4, d-g-f-b ed-fc hged q 2 q4 q6 q l q3 q 5 e-hgb ad* Input, qo Output, q, f c q 2 s, 46 Figure 1 Four processors connected by six queues sort eight records. An overview of the sorting process is shown at (a), and a snapshot of the sort after seven cycles is shown at (b). The indicates a break between strings of records.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Merge Sort for Distributed Memory Architectures Parallel Merge Sort for Distributed Memory Architectures Ii Contents 1 Introduction 1 2 Cole's Parallel Merge Sort: the Crew Algorithm 1 3 Implementing the Crew Algorithm on a Distributed Memory Ar- Chitecture 7

Cole presented a parallel merge sort for the PRAM model that performs in O log n parallel steps using n processors He gave an algorithm for the CREW PRAM model for which the constant in the running time is small He also gave a more complex version of the algorithm for the EREWPRAM the constant factor in the running time is still moderate but not as small In this paper we give an approach to imp...

متن کامل

A high-performance sorting algorithm for multicore single-instruction multiple-data processors

Many sorting algorithms have been studied in the past, but there are only a few algorithms that can effectively exploit both SIMD instructions and thread-level parallelism. In this paper, we propose a new high-performance sorting algorithm, called Aligned-Access sort (AA-sort), for exploiting both the SIMD instructions and thread-level parallelism available on today's multicore processors. Our ...

متن کامل

Percentile Finding Algorithm for Multiple Sorted Runs

External sorting is frequently used b>relational database s!-stems for building indexes on tables, ordered retrieval, duplicate elimination, joins, subqueries. grouping, and aggregation; it would be quite beneficial to parallelize this function. Previous parallel external sorting algorithms found in the database literature used a sequential merge as the final stage of the parallel sort. This re...

متن کامل

Parallel System Using a Library of Basic Primitives: Modeling and Experimental Results?

We present a comparative study of implementations of the following sorting algorithms on the Parsytec SC320 reconfigurable, asynchronous, massively parallel MIMD machine: Bitonic Sort, Odd-Even Merge Sort, Odd-Even Merge Sort with guarded split&merge, and two variants of Samplesort. The experiments are performed on 2up to 5-dimensional wrapped butterfly networks with 8 up to 160 processors. We ...

متن کامل

Sorting on a Massively Parallel System Using a Library of Basic Primitives: Modeling and Experimental Results

We present a comparative study of implementations of the following sorting algorithms on the Parsytec SC320 reconfigurable, asynchronous, massively parallel MIMD machine: Bitonic Sort, Odd-Even Merge Sort, Odd-Even Merge Sort with guarded split&merge, and two variants of Samplesort. The experiments are performed on 2up to 5-dimensional wrapped butterfly networks with 8 up to 160 processors. We ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IBM Journal of Research and Development

دوره 22  شماره 

صفحات  -

تاریخ انتشار 1978